The Effects of Pruning Methods on the Predictive Accuracy of Induced Decision Trees
نویسندگان
چکیده
Several methods have been proposed in the literature for decision tree (post)-pruning. This article presents a unifying framework according to which any pruning method can be de"ned as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of this framework and their common aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the e!ect of post-pruning on both the predictive accuracy and the size od induced decision trees is reported. The experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation procedure. The results con"rm most of the conclusions drawn in a previous comparison based on the holdout procedure. Copyright ( 1999 John Wiley & Sons, Ltd.
منابع مشابه
Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملStudy of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA
Classification is important problem in data mining. Given a data set, classifier generates meaningful description for each class. Decision trees are most effective and widely used classification methods. There are several algorithms for induction of decision trees. These trees are first induced and then prune subtrees with subsequent pruning phase to improve accuracy and prevent overfitting. In...
متن کاملAn Efficient Predictive Model for Probability of Genetic Diseases Transmission Using a Combined Model
In this article, a new combined approach of a decision tree and clustering is presented to predict the transmission of genetic diseases. In this article, the performance of these algorithms is compared for more accurate prediction of disease transmission under the same condition and based on a series of measures like the positive predictive value, negative predictive value, accuracy, sensitivit...
متن کاملApplication of classification trees-J48 to model the presence of roach (Rutilus rutilus) in rivers
In the present study, classification trees (CTs-J48 algorithm) were used to study the occurrence of roach in rivers in Flanders (Belgium). The presence/absence of roach was modelled based on a set of river characteristics. The predictive performance of the CTs models was assessed based on the percentage of Correctly Classified Instances (CCI) and Cohen's kappa statistics. To find the best model...
متن کامل